[rllib] Fix custom model-config mismatch between env-runner and learner by pseudo-rnd-thoughts · Pull Request #58739 · ray-project/ray

pseudo-rnd-thoughts · 2025-11-18T12:47:10Z

Description

The algorithm config isn't updating rl_module_spec.model_config when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated.
The reason this problem wasn't detected previous was that when updating the model state-dict is we used strict=False.
Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model.

from ray.rllib.algorithms import PPOConfig
from ray.rllib.core.rl_module import RLModuleSpec
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID


config = (
    PPOConfig()
    .environment('CartPole-v1')
    .env_runners(
        num_env_runners=0,
        num_envs_per_env_runner=1,
    )
    .rl_module(
        rl_module_spec=RLModuleSpec(
            model_config={
                "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch
            }
        )
    )
)

algo = config.build_algo()

learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID]
env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module)

print(f'{learner_module.encoder.config.shared=}')
print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}')

algo.train()

Related issues

Closes #58715

Signed-off-by: Mark Towers <mark@anyscale.com>

gemini-code-assist

Code Review

This pull request addresses a model configuration mismatch between the environment runner and the learner by correctly merging the algorithm's base model config with the RLModuleSpec's custom config. It also introduces a valuable error check in TorchRLModule.set_state to detect architecture mismatches when loading state into an inference_only module. My review includes a critical fix for the config merging logic to prevent a TypeError when the model config is a dataclass and to ensure compatibility with Python versions older than 3.9.

rllib/algorithms/algorithm_config.py

HassamSheikh · 2025-11-20T19:04:34Z

LGTM

HassamSheikh

LGTM

ArturNiederfahrenhorst · 2025-11-21T11:12:57Z

rllib/core/rl_module/torch/torch_rl_module.py

+            raise ValueError(
+                "Architecture mismatch detected when loading state into inference_only module! "
+                f"Missing parameters (not found in source state): {list(missing_keys)} "
+                "This usually indicates the learner and env-runner have different architectures."


Let's please be a little more precise here.
-> What does having a different architecture mean here?

Basically this Error should give a good clue to the user about what they are doing wrong.

Good spot, it should probably reference the layer names being difference.

Signed-off-by: Mark Towers <mark@anyscale.com>

ArturNiederfahrenhorst

thanks!

…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com> Co-authored-by: Hassam Ullah Sheikh <hassamsheikh1@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com> Co-authored-by: Hassam Ullah Sheikh <hassamsheikh1@gmail.com>

[rllib] Fix custom model-config mismatch between env-runner and learner

1b6949e

Signed-off-by: Mark Towers <mark@anyscale.com>

pseudo-rnd-thoughts requested a review from a team as a code owner November 18, 2025 12:47

pseudo-rnd-thoughts added rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-models An issue related to RLlib (default or custom) Models. labels Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

rllib/algorithms/algorithm_config.py Show resolved Hide resolved

pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 19, 2025

HassamSheikh self-assigned this Nov 20, 2025

Merge branch 'master' into issue-58715

5fbd0c3

HassamSheikh closed this Nov 20, 2025

HassamSheikh reopened this Nov 20, 2025

HassamSheikh closed this Nov 20, 2025

HassamSheikh reopened this Nov 20, 2025

HassamSheikh self-requested a review November 20, 2025 19:15

HassamSheikh approved these changes Nov 20, 2025

View reviewed changes

ArturNiederfahrenhorst enabled auto-merge (squash) November 21, 2025 11:11

ArturNiederfahrenhorst reviewed Nov 21, 2025

View reviewed changes

Improve error message

69d2f6d

Signed-off-by: Mark Towers <mark@anyscale.com>

github-actions bot disabled auto-merge November 21, 2025 11:42

ArturNiederfahrenhorst approved these changes Nov 22, 2025

View reviewed changes

ArturNiederfahrenhorst merged commit 32cc715 into ray-project:master Nov 22, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Fix custom model-config mismatch between env-runner and learner#58739

[rllib] Fix custom model-config mismatch between env-runner and learner#58739
ArturNiederfahrenhorst merged 3 commits intoray-project:masterfrom
pseudo-rnd-thoughts:issue-58715

pseudo-rnd-thoughts commented Nov 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

HassamSheikh commented Nov 20, 2025

Uh oh!

HassamSheikh left a comment

Uh oh!

ArturNiederfahrenhorst Nov 21, 2025

Uh oh!

ArturNiederfahrenhorst Nov 21, 2025

Uh oh!

pseudo-rnd-thoughts Nov 21, 2025

Uh oh!

ArturNiederfahrenhorst left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pseudo-rnd-thoughts commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

HassamSheikh commented Nov 20, 2025

Uh oh!

HassamSheikh left a comment

Choose a reason for hiding this comment

Uh oh!

ArturNiederfahrenhorst Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ArturNiederfahrenhorst Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pseudo-rnd-thoughts commented Nov 18, 2025 •

edited

Loading